Text Line Extraction in Historical Documents Using Mask R-CNN

نویسندگان

چکیده

Text line extraction is an essential preprocessing step in many handwritten document image analysis tasks. It includes detecting text lines a and segmenting the regions of each detected line. Deep learning-based methods are frequently used for detection. However, only limited number tackle problems detection segmentation together. This paper proposes holistic method that applies Mask R-CNN extraction. A model trained to extract fractions from patches, which further merged form entire page. The presented was evaluated on two well-known datasets historical documents, DIVA-HisDB ICDAR 2015-HTR, achieved state-of-the-art results. In addition, we introduce new challenging dataset Arabic manuscripts, VML-AHTE, where numerous diacritics present. We show R-CNN-based can successfully segment lines, even such scenario.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Scale-Space Anisotropic Smoothing for Text Line Extraction in Historical Documents

This paper presents a novel approach for text line extraction which is based on Gaussian scale space, a dedicated binarization, and an energy minimization framework. It enhances the text lines in the image using multi-scale anisotropic second derivative of Gaussian filter bank at the average height of the text line. It then applies a binarization, which is based on component-tree and is tailore...

متن کامل

Prostate segmentation and lesions classification in CT images using Mask R-CNN

Purpose: Non-cancerous prostate lesions such as prostate calcification, prostate enlargement, and prostate inflammation cause too many problems for men’s health. This research proposes a novel approach, a combination of image processing techniques and deep learning methods for classification and segmentation of the prostate in CT-scan images by considering the experienced physicians’ reports. ...

متن کامل

Text Line Extraction from Complex Layout Documents

There are numerous stylish documents which do not have the traditional text layouts where printed text regions are not parallel to each other. Such complex layouts make text line extraction challenging due to multi-orientation of paragraphs. This paper introduces a system for the text line extraction from the complex layout documents. Proposed method is based on the concept of dilation and hist...

متن کامل

Text line extraction for historical document images

0167-8655/$ see front matter 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.patrec.2013.07.007 ⇑ Corresponding author at: Department of Computer Science, Triangle Research & Development Center, Kafr Qarea, Israel. Fax: +972 4 6356168. E-mail addresses: [email protected] (R. Saabni), [email protected] (A. Asi), [email protected] (J. El-Sana). 1 These authors contribut...

متن کامل

Text Extraction from Historical Handwritten Documents by Edge Detection

Many national archives or libraries keep large amount of historical handwritten documents. One problem that many archivists are facing is the sipping of ink through the pages of certain double-sided handwritten documents after long periods of storage. The result is that the handwritten characters from the reverse side appear as noise on the front side and even interfere with the front side char...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Signals

سال: 2022

ISSN: ['2624-6120']

DOI: https://doi.org/10.3390/signals3030032